Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report

نویسندگان

  • Christopher Cieri
  • Mark Liberman
چکیده

Changes in the supply of and demand for language resources continues to affect the role of large data centers such as the Linguistic Data Consortium (LDC) and European Language Resource Center (ELRA) within the research communities they serve. The past few years have seen increased demand for: intensively multi-modal resources, larger data sets in high-density languages and new data in low density languages; standards and tools for corpus development and re-useable resources. The next few years will bring demand for extensive batteries of coordinated language resources with sophisticated annotation in several major languages. The DARPA program in Translingual Information Detection Extraction and Summarization (TIDES) has already undertaken such resource development; programs with similarly broad scope addressing other technologies will surely follow. Data centers will be well placed to address these needs if they integrate new resource development with distribution of existing resources to fill known gaps by creating or assisting the creation of new data. LDC has projects ongoing to address all of these issues. This paper will provide an overview of LDC activity in corpus creation, annotation and distribution and describe new efforts bring together communities of researchers, to identify best practices and develop tools of general use.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

This paper described recent activities of the Linguistic Data Consortium in the collection, annotation and distribution of language data the developments of tools and standards for using that data, the creation of metadata to facilitate the search for linguistic resources.

متن کامل

15 Years of Language Resource Creation and Sharing: a Progress Report on LDC Activities

This paper, the 5 in a series of biennial progress reports, reviews the activities of the Linguistic Data Consortium with particular emphasis on general trends in the language resource landscape and on changes that distinguish the two years since LDC’s last report at LREC from the preceding 8 years. After providing a perspective on the current landscape of language resources, the paper goes on ...

متن کامل

New Directions for Language Resource Development and Distribution

Despite the growth in the number of linguistic data centers around the world, their accomplishments and expansions and the advances they have help enable, the language resources that exist are a small fraction of those required to meet the goals of Human Language Technologies (HLT) for the world’s languages and the promises they offer: broad access to knowledge, direct communication across lang...

متن کامل

Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities

On the Linguistic Data Consortium’s (LDC) 20th anniversary, this paper describes the changes to the language resource landscape over the past two decades, how LDC has adjusted its practice to adapt to them and how the business model continues to grow. Specifically, we will discuss LDC’s evolving roles and changes in the sizes and types of LDC language resources (LR) as well as the data they inc...

متن کامل

The Creation, Distribution and Use of Linguistic Data: the Case of the Linguistic Data Consortium

The Linguistic Data Consortium (LDC) is an open consortium of universities, companies and government research laboratories. It creates and distributes speech and text databases, lexicons and other resources. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Defense Advanced Research Projects Agency (DARPA). Currently, all LDC publica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002